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Abstract 

> 

, In this note we investigate the complexity of the Minimum Label Align- 

£ — ' ment problem and we show that such a problem is APX-hard. 

00 ' 

• 1 Introduction 

O 

£S| ' In this note we consider the computational (and approximation) complexity 

of the Minimum Label Alignment problem. This problem has been recently 
introduced in bioinformatics to deal with the inference of evolutionary scenarios 
for genome organization In this note we show that the problem is APX- 
r>J . hard, even when the genome contains at most five occurrences of the same gene. 

The results implies that the Duplication-Loss Alignment problem and the Two 
Species Small Phylogeny problem introduced in [3] are not in even in NP. 

Next, we introduce some preliminary definitions. A genome is considered as 
a string over alphabet E. The i-th character of a genomes X is denoted by Xi. 
Two aligned genomes X, y are two aligned strings over alphabet E~ = E U { — } 
(where — denotes a gap in the alignment) such that if Xi ^ — and 3^ i= — , then 
Xi = yi and Xi, cannot be both equal to — . Two aligned genomes can be 
seen as a matrix of size 2xm (where m is the size of the alignment). A column 
is a match if it does not contain a gap. A labeling of an aligned genome X is an 
interpretation of the unmatched characters of X in terms of duplications and 
losses. A duplication can be represented as a directed arc from a substring of 
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X to a, different identical substring of X. A labeling is feasible if it induces no 
cycle. Consider a duplication in X from a substring s to a substring t. Such a 
duplication is called maximal if s and t are two identical maximal substrings in 
X, that is if the characters on the left of s and t in X are different (or one of 
these characters does not exist) and if the characters on the right of s and t in 
X are different (or one of these characters does not exist). 

Giving a cost function c that defines the cost of the possible operations 
considered (duplications and losses), the cost of a labeling of X, V is the sum 
of the costs of the underlying operations. 

We investigate the complexity of the following problem. 

Problem 1. Minimum Labeling Alignment [ML A] 

Input: two aligned genomes X and y. 

Output: a minimum cost feasible labeling L of X and y. 

In what follows, given a graph G — (V,E) and a vertex v £ V, we denote by 
N(v) the set of vertices adjacent to v in G. A graph G = (V, E) is cubic when 
N(v) = 3 for each v e V. 

2 Hardness of Minimum Labeling Alignment 

We prove that the MLA problem is APX-hard, even if each symbol (gene) 
has at most 5 occurrences in X or y, by giving a reduction (more precisely 
an L- reduction [2]) from the Minimum Vertex Cover problem on Cubic graphs 
(MVCC) to MLA. Notice that MVCC is known to be APX-hard [1]. 

Problem 2. Minimum Vertex Cover Problem on Cubic graph [MVCC] 
Input: a cubic graph G — (V, E), where V — {i>i, . . . , v n }. 
Output: a minimum cardinality set V C V, such that for each {vi,Vj} £ E, 
at least one of «j, Vj belongs to V . 

Next, we present the L-reduction from MVCC to MLA. Let G = (V, E) be a 
cubic graph. Define the following ordering on the edges in E: Vj} < {v x , v y } 
if and only if i < x, or (in case i — x) j < y. We denote by {v\,v a } and {v z ,v w } 
the first and the last edges of E. Notice that, based on this ordering, we denote 
the edges incident on Vi, as the first, the second and the third edges of u,. 
Furthermore, in what follows, given a vertex Vi £ V, we denote with {vi,Vj}, 
{vi,Vh}, {vi,Vk} the three edges of G incident on v%. 

Now, we define the corresponding aligned genomes X and y as follows. First, 
we present an overview of the construction of X and y. The aligned genomes 
X and y consists of two parts and each part is then divided into blocks (that 
is substrings): the leftmost part is called the Vertex-Edge-Set Part (VE-Part), 
the rightmost part is called the Auxiliary Part (A-Part) (see Fig. [TJ). 

In the VE-part each position of X is different from — , while V contains some 
gaps. Each position of X and y in the A-part is a match, hence X and V are 
identical in the A-part. By construction each position of the aligned genome 
y is either a gap or it is a match, hence the characters of y do not need any 
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X — Bx-Ve{v\) ■ ■ ■ Bx-VE\V n )Bx-VE\ei,a) ■ ■ ■ Bx-ve{gz,w) ■ 

» ' 

VE-part 

' B XiAtl (vi)Bx,A,2(vi) ■ ■ ■ Bx,A,l{Vn)B X ,Afi{v n ) 
A-part 

y = By^vE{v\) ■ ■ ■ By-VE(v n )By-VE{ei,a) ■ ■ ■ By-VE(e z ,w) ' 

v . ' 

VE-part 

• By t A,l(Vl)B y) A,2(Vl) ■ ■ ■ By tAtl (v n )By iA>2 (v n ) 
A-part 

Figure 1: The structure of X and y. 

labeling. It follows that the definition of a labeling of X and y is computed by 
labeling the unmatched elements in the VE-part of X. 

The VE-part of X and y consists of the concatenation of \V\ + \E\ blocks 
(see Fig. [J): one block Bx-ve(^%j) {By-vE{&i,j) respectively) in X (y re- 
spectively) for each edge {vi,Vj} e E, one block Bx-vE{vi) {By-vE{vi) re- 
spectively) in X (in y respectively) for each vertex u$ 6 V. 

The A-part of X and y consists of the concatenation of 2\V\ blocks (see Fig. 
[J): two blocks B x ,A,l(vi), Bx,Aa{ v i) ( B y,A,i{ v i)> By <At2 (vi) respectively) in X 
(in y respectively), for each Vi e V. 

Now, we define the specific values of the blocks of X and y. Given an edge 
{vi,Vj} <E E, where i < j and {vi,Vj} is the p-th edge of Vi and the q-th of 
Vj, 1 < p < 3 and 1 < q < 3, we define its associated blocks Bx-VE( e i,j)> 
By-vE(ei.j)- The block Bx-vE(ei.j) is defined as follows: 

Bx—VE^itj) — Se,i,j-Ei,p£iJ,l&i,j,2%j,q 

The block By^vE(^i,j) is defined as follows: 

By-vE{ei,j) = s e s^{—) A 
Hence notice that Bx-vE(e%j) contains 4 unmatched characters, that is the 

Now, we define the block Bx-ve(vi), with Vi G V. The i-encoding of 
{i>i, Vj}, i — encij, is defined as follows: 

and i — encij[l] = Xi lP , i — enci.j[r] — e^j^eij^. The j-encoding of {vi,Vj}, 
j — enc-ij, is defined as follows: 

and j - enci t j[l] = e^ie^, j - enc it j[r] = Xj >q . 
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The block Bx-vE{vi) is defined as follows: 

B X -VE{vi) = SiZ iA z ia i - enc hj z ifi z iA i - enc hh z t , 5 z ifi i - enc hk z it7 z iiS 

The block By-vE{vi) is defined as follows: 

By-VE(vi) = Si(-) 17 

Hence notice that Bx-ve{vi) contains 17 unmatched characters, that is the 
substring z^iz^ i - enc it j z^ 3 z iA i - enc iih Zi^z lfi i - enc^ k z^ 7 z hS . 

Now, we define the A-part of X and y. Recall that X and y are identical 
in the A-part. The block Bx,A,i(vi) is defined as follows: 

Bx,Aa{ v i) = WiAZiAZiflWi,2Zi£Zi A Wi£Zi£ZifiWi y iZi^Zifi 

The block By t A,i(vi) is identical to Bx.aa (vi). 
The block Bx.a,2 (^i) is defined as follows: 

B x ,A,2( v t) = ^i.\Zi,2 i-encij[l] u t . 2 i-enc itj [r] z it3 u it3 z l . A i-enc ijh [l] u lA i-enc ith [r] z it5 - 

■Ui^Zifi i - end.kll] u lfi i - enc hk [r] z h7 
The block By t A,i{vi) is identical to Bx,a,2{v%)- 

Example 2.1. A cubic graph G = (V,E) and the the corresponding genome 
X. 

G 

V 2 «3 




First, we define the blocks Bx-VE{^i,j) associated with edges {vi,Vj} E E 

• B x _ VE (ei t2 ) = s e ,i j2 a;i,i e i,2,i e i,2,2a;2,i 

• Bx-vE{ei,3) = s e ,i,3^i,2ei,3,iei,3,2a;3,i 

• Bx-Ve{cia) = Se,l,4#l,3ei,4,iei, 4,2^4, 1 

• Bx-VE(e 2 ,3) = S e .2,3X2,2e 2 ,3A e 2,3,2X3,2 

• B X -VE(e 2A ) = S e , 2,4^2, 3 e 2,4,l e 2,4, 2^4,2 
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• Bx-VE{e?,,i) = S e , 3, 4^3, 363, 4, 163, 4, 2^4, 3 

Now, in order to define the block Bx-vE{Vi)-> with «j G V, we have to define 
the encoding of i — encij, j — encij: 



• 


1 


— enci,2 


= afi,iei,2,iei,2,2; 2 - 


enc(l,2) 


= ei,2,iei,2,2^i,2 


• 


1 


- enci. 3 


= ^i,2ei,3,iei,3,2; 3 - 


enc(l, 3) 


= 61,3,161,3,2X3,1 


• 


1 


— enc\ t 4 


= a;i,3ei,4,iei,4,2; 4 - 


enc(l,4) 


= 61,4,161,4,2^4,1 


• 


2 


- enc 2 , 3 


= X2 : 2e2,3,l e 2,3,2; 3 — 


enc(2, 3) 


= 62,3,162,3,2^3,2 


• 


2 


— enc2,4 


= ^2 : 3e2,4,l e 2,4,2; 4 _ 


ene(2, 4) 


= 62,4,162,4,2^4,2 


• 


3 


— enc3 t 4 


= ^3,3e3,4,l e 3,4,2; 4 - 


enc(3, 4) 


= 63,4,163,4,2^4,3 



Bx-VE{e\,i) = ^e,l,3^1,2ei,3,iei,3,2X3,l 



Bx-VE{eia) = g e,i,2^i,iei,2,iei,2,2^2,i -B^_vB(ei,4) = «e, 1, 4^1, 3ei, 4,161,4,2^4,1 



^Af.A.lC'Wl) = 1Z1, 121, 2 ^1, 2 Z1, 3^1,4 ^1,321, 5^1, 6^1, 4-Zl, 721,8 | 

iype a labeling — > ^ ^, ^ ^ ^ ^ v 

B^-VB^l) = Sl2l,l2l,2a:i,iei,2,iei,2,22l,32l,4a;i,2ei,3,iei,3,22l,5Zl,63;i,3ei,4,iei,4,2Zl,72l,8 

type b labeling — t L £ 

-BA",A,2(fl) = «l,l^l,2a;i,l'Ul,2ei,2,iei,2,2«l,3'"l,321,4a;i,2«l,4ei,3,iei,3,2«l,5Ul,52!l,6a;i,3«l,6ei,4,iei,4,2Zl,7 

A fj/pe a labeling for B^_y£;(ui) (in the upper part) and a type b labeling 
for B^_y£;(^i) (in the lower part). 

□ 

Now, we define the cost e of labeling the aligned genome X (recall that 
y does not need any labeling). Given an integer z > 1, then the cost of a 
duplication of length z is c(D(z)) = 1, while the cost of a loss of length z is 
c(L(z))=z. 

Before giving the details of the proof, we give a high-level description of the 
reduction. We will show that each block Bx-ve{vi) can be labeled essentially 
in two possible ways (see Remark 12. II and Example 12. 

1. with a type a labeling, defining maximal duplications from Bx-VE(ei,j), 
Bx-VE(ei,h), Bx-VE(ei,k), B x ,A,i{vi) to B X -ve(vi); a type a labeling is 
the optimal labeling of Bx-ve{vi) (see Lemma |2~3"1) ; 

2. with a type b labeling, defining maximal duplications in Bx-vE{vi) from 
the block Bx,A,2{vi) to Bx-vE{vi); a type b labeling is a suboptimal la- 
beling of Bx-ve(vi) (see Lemma [2~3|) . 
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Thanks to the property of block Bx-vE(^i,j) (see Remark |2"?21 and Lemma |2"^T| , 
we will able to relate these two type of labelings with a cover of G (see Lemma 
12.51 and Lemma 12.61) : a type b labeling for Bx—vE(vi) corresponds to a vertex 
vi in a vertex cover V' of G, & type a labeling for Bx-vE(vi) corresponds to a 
vertex Vi in V \ V of G. 

Now, we give the details of the reduction. First, we introduce some prelimi- 
naries properties of X and y. 

Remark 2.1. Given a cubic graph G — (V,E), let Vi be a vertex of V such 
that {vi,Vj}, {vi,Vh\, {vi,Vk} are the first, the second and the third edges of Vi 
respectively. Let (X , y) be the corresponding instance of ML A. The following 
labeling of Bx-vE{vi) (denoted as a type a labeling for Bx-ve{vi)) has a cost 
of '7 (it consists of 7 duplications): 

• four duplications coming from the block Bx y A,i{vi), for the substrings 
Zi,2p-i,Zi i2p , 1 < P < 4; 

• three duplications coming from the blocks Bx-VE(sij) (for the substring 
i - encij), B X -vE(eih) (for the substring i - enc^ h ), B X -vE{&ik) (for 
the substring i — enci^). 

The following labeling of Bx-vE(vi) (denoted as a type b labeling for Bx~vE(vi) ) 
has a cost of 8 (it consists of 6 duplications and 2 losses): 

• six duplications from Bx .a, 2{vi) to Bx-ve{vi) (substrings i — encij[l], 
i - encij[r] z ii3 , z lA i - enc^l], i - ene^M z ifi , z lfi i - enc^l], i - 
enci <k [r] z it7 ); 

• two losses for the two substrings Zi t i and x^g. 

Notice that in a type b labeling for Bx-VE(vi), there is no duplication of 
B X -vE(vi) from substrings of B X -vE{eij), B x ~vE{eih), Bx-vE^ik)- 

Remark 2.2. Let G = (V, E) be a cubic graph, let {vi, Vj} £ E, with i < j , be 
the p-th edge of Vi, 1 < p < 3, and the q-th edge of Vj, 1 < q < 3. Let {X, y) be 
the corresponding instance of MLA. The following labeling of Bx-vE(ei^) has 
cost 2: 

• one duplication coming either from Bx-ve(vi) (for the substring Xi^eij^ei^^) 
or from Bx-vE(vj) (for the substring Sij^^jfiXj^); 

• one loss either for the last character of Bx-vE(ei.j) or for the second 
character of Bx-vE^ij) (that is the unmatched character of Bx-ve(vj) 
not involved in the duplication) . 

Now, we are ready to show that a type a labeling is the only optimal labeling 
for Bx-ve{v 3 ). 

Lemma 2.3. Let G = (V, E) be an instance of MVCC and let (X, y) be the 

corresponding instance of MLA. Then, given a block Bx-VE{vi), with Vi G V: 
(1) any feasible labeling of Bx-ve{vi) has a cost of at least 7; (2) if a labeling 
has cost of 7, then such a labeling is a type a labeling for Bx-vE(vi)- 
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Proof. The proof that any feasible labeling of Bx-ve{vi) needs a cost of at least 
7 follows from a simple counting argument. Notice that Bx-vE(vi) contains 17 
unmatched characters and that Bx-ve{v%) is labeled by duplications of length 
at most 3. By construction, any feasible labeling of Bx-ve(vi) can define a 
duplication of length at most 2 that contains the leftmost unmatched character 
of Bx-vE{vi)- The same property holds for the rightmost unmatched character 
of Bx-ve{vi)- Hence, consider the unmatched characters of Bx-ve{vi) not 
labeled by one of these two labelings of the rightmost and leftmost characters 
of Bx-ve(vi)- Those characters of Bx-ve{vi) are at least 13, and since each 
duplication has length at most 3, it follows that at least \^§-~\ = 5 duplications 
arc required for labeling these 13 unmatched characters of Bx-ve{vi)- This 
implies an overall cost of at least 7 for any labeling of Bx-vE{vi). 

Now, we prove that if a feasible labeling of Bx-vE{vi) has a cost of 7, then 
such a feasible labeling must be a type a labeling of Bx-vE{vi). First, notice 
that if a labeling of Bx-ve(vi) contains only duplications from Bx-vE{&i,j), 
Bx-VE(^i,h), Bx~vE{si.k), Bx.A.ii^i), then it has a cost of 7 if and only if 
is a type a labeling. Indeed, a type a labeling is the only labeling that consists 
only of maximal duplications from B X -vE(ei,j), B X -vE(ei,h), B X -vE{ei,k), 
Bx,A,i(v t ) to Bx-vE(vi). 

Now, assume that a labeling of Bx-ve{vi) contains only duplications from 
Bx,A,2(vi). A type b labeling is the only labeling of Bx-ve{vi) that consists 
only of maximal duplications from Bx,A,2(vi), hence any other labeling of 
Bx-ve{vi) that contains only duplications from Bx t A,z{vi) requires a cost of 
at least 8. 

Hence, assume that a labeling L of Bx-ve(vz) contains duplications from 
Bx,A,2(v t ) and from some of B X -vE(eij), Bx-vE(e t ,h), B X -vE(ei : k), B x ,A,\{vi). 
Consider a substring s of Bx-ve{vi) labeled by a duplication from a substring 
t of Bx.A,2(vi). First, notice that if this duplication is not maximal, we can ex- 
tend this duplication as a maximal duplication from a substring s' that includes 
s to a substring t' that includes t, without increasing the cost of the labeling. 
Notice that then s' is labeled as in a type b labeling. 

Now, we show how to modify L into a labeling L' ', which is a type b labeling, 
without increasing the cost of the solution. L' defines a labeling of Bx-ve{v%) 
by iterating the following procedure. Denote with s* be the substring of Bx-ve{vi) 
already labeled by V in the procedure. First s* — s', that is L' labels the 
string s' as a duplication from t'. Then, consider the unmatched character a 
of Bx-ve(vi) on the left of s* (if it exists). If a ^ z\, L' defines a maximal 
duplication from a substring of Bx,A,2(vi) to a substring s" on the left of s* (as 
in type b labeling solution); if a = z\, L' labels a as a loss. Similarly, consider 
the character (3 on the right of s* (if it exists). If j3 ^ zg, L' defines a maximal 
duplication from a substring of Bx.A.2(vi) to a substring of Bx-ve(vi) on the 
right of s' (as in type b labeling solution); if [3 = Zg, L' labels (3 as a loss. 

Iterating this procedure, we define a labeling L' having the same cost as L, 
since at each step of the iteration, the cost of L' with respect to L is never 
increased. Indeed, consider an unmatched character adjacent to s* , assume 
w.l.o.g. that this character a is on the left of s* . L labels a with some la- 
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bel whose cost has not been considered in previous iterations. At each step 
the procedure defines a duplication of maximal length having a as right end- 
point. Indeed, by construction maximal duplications from Bx .A,i{vi) and from 
Bx-VE{ei,j), B X - VE (e hh ), B x ^ VE (e^ k ), B x , A ,i{ v i) nave different start and 
ending positions in B X -vE(vi) (except for the rightmost and the leftmost un- 
matched characters of B X -ve(vi)). 

Since L' is a type b labeling B X -VE(Vi) , and L has the same cost of L' , it 
follows that L has a cost of at least 8. □ 

Now, we prove a property on the labeling of a block Bx-vE{ei,j)- 

Lemma 2.4. Let G — (V, E) be an instance of MVCC and let (X, y) be the 
corresponding instance of MLA. Then, each feasible alignment of B X -VE{fii,j) 
has a cost of at least 2; furthermore, if an alignment of B X -vE(ei,j) has a cost 
of 2, then B X —vE(fii,j) is labeled with one duplication from B X -vE(vi) or with 
one duplication from B X -ve(vj). 

Proof. Consider the block B X -VE{&i,j)- By construction, since B X -vE(^i,j) 
contains 4 unmatched characters and since there is no other substring in X that 
is identical to B X -VE(si,j), it follows that any labeling of B X -vE(eij) requires 
a cost of at least 2. 

Now, assume that B X -vE(ci,j) is not labeled by a duplication from B x ^y E {xii) 
or from B x ^ve{vj). It follows that either each character of B X —vE(^i,j) is la- 
beled as a loss (hence the cost of such labeling is exactly 4) or the substring 
e i,i,i) e i,j,2 of B X -VEi&i,j) is labeled as a duplication from B Xt A,2(vi)- By con- 
struction, this implies that the leftmost unmatched character of B X -VE(ei,j) is 
either a duplication of length 1 or a loss, and similarly, the rightmost unmatched 
character of B X -VE{si,j) is either a duplication of length 1 or a loss. Hence 
this labeling of Bx-vE(ei,j) has a cost of 3. □ 

Now, we are ready to prove the two main properties of the reduction in 
Lemma l2.5l and in Lemma 12.61 

Lemma 2.5. Let G be an instance of MVCC and let (X ', y) be the corresponding 
instance of MLA. Then, given a vertex cover V C V of G, we can compute 
in polynomial time a solution of MLA over instance (X, y) of cost at most 
8|V r '| + 7|y\V r '| + 2|£:|. 

Proof. Let V' be a cover of G. We define a solution of MLA over instance (X, y) 
by labeling X. First we define the following labeling of block B x ~vE{vi), for 
each Vi 

• for each Vi £ V, define a type b labeling for the corresponding block 
B x ~ve(vi) (hence the labeling of this block has a cost of 8, see Remark 

EH); 

• for each Vi € V \ V' , define a type a labeling for the corresponding block 
B X -vE(vi) (hence the labeling of this block has a cost of 8, see Remark 

E0); 
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Now, for each {vi,Vj} € E (assume w.l.o.g. i < j), we define a labeling of 
the corresponding block Bx~vE(ei,j) as follows: 

• if Vi € V, define a duplication from Bx-vE{vi) to Bx-vE{&i,j) (more 
precisely a duplication for the rightmost three unmatched characters of 
Bx-VE{&i,j)) and a loss for the leftmost unmatched character of Bx-vE(fii,j)] 

• else (notice that in this case Vj must be in V), define a duplication from 
Bx-VE(vj) to B x -v E^eij) (more precisely a duplication for the leftmost 
unmatched characters of Bx-vE(ci.j)) and a loss for the rightmost un- 
matched character of Bx-vE{&i,j)- 

Notice that, since V is a vertex cover of G, at least one of Vi, Vj 6 V , hence 
this labeling is always possible. 

Now, we show that this labeling is feasible (that is no cycle is induced by the 
labeling). By construction, a block Bx-vE(vi) has a duplication coming from 
a block Bx^vE^i.j), only if there is no other block of X with a duplication 
coming from Bx~vE(vi). In case a block Bx-vE{vi) has a duplication coming 
from a block Bx-vE(ei,j), the labeling of Bx-vE(ci,j) defines a duplication 
from Bx-ve{vj) to Bx-vE{ei,j), and Bx-vE{vj) has duplications coming only 
from Bx,A,2{vi), which does not need any labeling hence it has no incoming arc. 
Hence, no cycle is induced by this labeling. □ 

Lemma 2.6. Let G be an instance of MVCC and let (X, y) be the corresponding 
instance of MLA. Then, given a feasible labeling of (X, y) of cost 8p + 7(\V\ — 
p) + 2\E\, we can compute in polynomial time a vertex cover of G of size at most 
P- 

Proof. Let L be a feasible labeling of (X,y) of cost 8p + 7(\V\ -p) + 2\E\. First, 
we consider the labeling of each block Bx-vE(vi), with Vi S V. By Lemma [2751 
we can assume that Bx^ve(v%) is either a type a labeling or a type b labeling. 
Indeed, if the cost of the labeling of B X -vE{vi) is 7, then by Lemma |2"31 it must 
be a type a labeling. If the cost of the labeling of Bx-ve(vi) is greater than 
7, then we can modify (in polynomial time) the labeling of B x -v E{vi) so that 
it is a type b labeling solution. Notice that this modification does not induce 
any cycle in i, since it defines duplications from Bx,A,2(vi) to Bx-vE(vi), and 
Bx,A,2 {vi) does not need any labeling, hence it has no incoming arc. 

Now, consider a block Bx-vE(ei,j), with {vi,Vj} G E. We show that we 
can assume that at least one of Bx-ve(vi), Bx-ve(vj) has a type b labeling 
in L. Assume to the contrary that both Bx-vE{vi), Bx-ve{vj) have both a 
type a labeling. Then by Lemma 12 .4[ the cost of the labeling of Bx-vE(ei,j) 
has a cost of at least 3, as Bx-vE(&i,j) obviously cannot contain duplications 
from Bx-vE(vi), Bx-vE(vj), otherwise L would induce a cycle and it would 
not be feasible. Now, starting from L, we compute in polynomial time a feasible 
labeling L' such that c(L') < c(L), as follows: we define a type b labeling for one 
of Bx-vE(vi), Bx-vE(vj), w.l.o.g. Bx-vE(vi), and we define a duplication 
from Bx-ve(vi) to Bx-vE(cij) (for the substring i — enctj, and a loss for the 
character Xj y q, 1 < q < 3, of Bx-vE{ci,j) not labeled as a duplication from 
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Bx-VE(vi). Notice that, since L is feasible, the labeling L' is feasible, since 
Bx~ve{vi) is a type b labeling, hence the duplications of Bx-vE{vi) come from 
Bx.A.2(vi), that does not have any label and no incoming arc. Furthermore, 
notice that c(L') < c(L), since we have increased of 1 the cost of the labeling of 
B x -v E{vi) i changing from a type a labeling to a type b labeling, while we have 
decreased of at least 1 the cost of labeling Bx-VE[&i,j)- 

As a consequence we can assume that L is a feasible labeling with the fol- 
lowing properties: (1) each block Bx-ve(vi) has either a type a labeling or 
a type b labeling; (2) for each block Bx-vE{ei,j), at least one of Bx-vE(vi), 
Bx-ve{vj) has a type b labeling. We define a vertex cover V of G as follows: 

V = {vi : Bx-ve(vi) has a type b labeling} 

Since for each Bx-vE^ij) at least one of Bx-ve{vi), Bx-vE(vj) has a 
type b labeling, it follows that V' is a vertex cover of G. Furthermore, since the 
cost of L is at most 8p + 7(\V\ -p) + 2\E\, it follows that \V'\ <p. □ 

Theorem 2.7. MLA is APX-hard. 

Proof. The proof follows from Lemma 12.51 and from Lemma 12.61 and from the 
observation that in a cubic graph \E\ = ||V| and a vertex cover has size at least 
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